Cross-language analysis of world regions in the press

An empirical approach based on Wikidata

Claude Grasland & Etienne Toureille

19/10/2021

1. INTRODUCTION

Previous analysis on german (left) and french (right) newspapers has demonstrated the interest to analyse networks of states and world regions :

But before to validate this results we need :

  1. to clarify our definition of world regions and the associated list of target units.
  2. to enlarge the dictionary to other languages (turkish, arabic, english)

Objectives

The objective of this short note is to explore the possibility of Wikidata for the production of multilingual dictionaries of world regions and more generally regional imaginations. Different types of “regions” related to the division of the Earth (“natural”) or the division of the World (“political”)

But the difference is not clear : see. Grataloup (2011), Lewis and Wigen (2019), Copeaux (1997), Brennetot and Rosemberg (2013)

Earth/Natural regions: Atlas

So-called “Physical maps” in Atlas are a good source :

World/Political regions : IGO

Source : https://commons.wikimedia.org/wiki/Atlas_of_international_organizations

World/Political regions : Other …

A cross-language perspective

We propose to etablish a dictionary of Earth and World Regions in the five languages of interest for the project IMAGEUN :

We want to avoid any “eurocentric” or “anglocentric” perspective in the definition of entities. Therefore our definition of entities will follow the following rules :

  1. Non universal : Entities will not necessary be available in all languages
  2. Non equivalent : Translation of names does not imply equivalence of entities
  3. Non hierarchic : An entity has different definitions in each language. None of the language can be considered as “pivot” or “reference.”

Entities equivalences and lexical universes

To summarize, we propose to build partial equivalences between entities that belong to different lexical universes.

The comparison between lexical universes will be necessarily limited to a small sample of entities for which we can assume that the entities are approximately equivalent.

2. WIKIDATA

Wikidata defines itself as

Codification of entities

The first interest of wikidata is to provide unique code of identifications of objects. For example a research about “Africa” will produce a list of different objects characterized by a unique code :

Informations on entities

Once we have selected an entity (e.g. Q15) we obtain a new page with more detailed informations in english but also in all other languages available in Wikipedia.

Informations on entities

A lot of information are available concerning the entity but, at this stage, the most important ones for our research are :

  1. the translation in different languages
  2. the equivalent words or expression in different languages
  3. the definitions in different languages
  4. The existence of an ambiguity of meaning, for example homonymies..

Wikidata data allows to formalize a procedure to build dictionaries and to objectify entity and translation choices between expert coders (us). It should lead to the construction of “specialized” dictionaries for the analysis of geographical entity, through discussion between the native speakers of the different languages in the project.

Multilanguage defintions

The specificity of the wikidata ontology is the fact that it is a multilinligual web where Q15 is a node of the web present in different linguistic layers. It means that we don’t have a single name or a single definition of Q15, except if we choose the english language as reference. Depending on the context (i.e. the language or sub-language), Q15 could be defined as :

language definition
fr A continent named Afrique
en A continent on the Earth’s northern and southern hemispheres named Africa or African continent
de A “Kontinent auf der Nord- und Südhalbkugel der Erde” named “Afrika”
tr A “Dünya nin kuzey ve güney yarikürelerindeki bir kita” named “Afrika” or “Afrika kitasi”
ar The second largest continent in the world in terms of area and population, comes second only to Asia (trad.)

Correspondance between entities ?

The existence of the same code of wikipedia entities does not offer any guarantee of concordance between the geographical objects found in news published in different languages or different countries. But - and it is the important point - it help us to point similarities and differences between set of geographical entities that are more or less comparable in each language.

Cross-language perspective

Having in mind the limits of the equivalence of entities across languages, it can nevertheless be an interesting experience to select a set of wikipedia entities (Q15, Q258, Q4412 …) and to examine their relative frequency in our different media from different countries with different languages. A typical hypothesis could be something like :

which is not equivalent to the question

but rather equivalent to the two joint questions

Workflow in a nutschell

We propose a semi-automatic method of extractions of entities in different languages that implies the presence of human expert at each step of the analysis. The figure below describe an example of research of world regions related to Africa in three languages.

The programs used for computer implementation are explained in the media cookbook on github with an example of implementation available onf the following page

3.EXPERIMENTS

We have realized a test of the previous workflow on an arbitraty selection of 110 entities :

  1. 65 entities related to continent and “natural” Earth divisions :
  1. 45 regional organizations mentionned by Wikimedia : NATO, EU, CEI, NAFTA, …

Warning : This analysis does not offer perfect guarantee of quality because :

  1. The list of entities has not been discussed by the IMAGEUN’s partners
  2. The dictionary established in the different languages has not been controled by native speakers

The purpose is therefore only to provide food for thought.

Data

We start from a corpus of text where target wikipedia entities has been recognized :

text source date regs nbregs
Europa und Südamerika: EU und Mercosur beschließen weltweit größte Freihandelszone de_DEU_suddeu 2019-06-28 Q46 Q18 Q458 Q4264 4
‘Rolling emergency’ of locust swarms decimating Africa, Asia and Middle East en_GBR_guardi 2020-06-08 Q15 Q48 Q7204 3
Asie, Afrique, Europe: la nouvelle stratégie de l’État islamique fr_FRA_figaro 2019-05-03 Q48 Q15 Q46 3
Meteoroloji’den Marmara, Iç Ege, Akdeniz, Iç Anadolu, Bati ve Orta Karadeniz, Dogu ve Güneydogu Anadolu bölgeleri için saganak uyarisi tr_TUR_yenisa 2020-02-21 Q4918 Q12824780 Q166 3

Experience 1 : An Inter-Language analysis of lexical universes

Experience 1 : Europe/EU (Q46 / Q458)

Experience 1 : Mediterranea (Q4918)

Experience 2 : A Cross-Language analysis of regional entities

Experience 2 : Data aggregation

For the experience 2, we create a new object called hypercube where the text of news has been removed and where we keep only the number of tags or proportion of news speaking from one or several regions (where1, where2), by media (who) and by time period (when)

## Joining, by = "id"
who when where1 where2 tags news
fr_FRA_figaro 2019-01-01 Q46 Q15 2 0.3611111
fr_FRA_figaro 2020-01-01 Q46 Q15 2 0.5000000
fr_FRA_figaro 2021-01-01 Q46 Q15 1 0.2500000
de_DEU_frankf 2021-01-01 Q46 Q15 1 0.2500000
de_DEU_suddeu 2020-01-01 Q46 Q15 1 0.2500000
en_GBR_telegr 2020-01-01 Q46 Q15 1 0.2500000
en_IRL_irtime 2019-01-01 Q46 Q15 1 0.2500000
en_IRL_irtime 2020-01-01 Q46 Q15 1 0.2500000
en_IRL_irtime 2021-01-01 Q46 Q15 1 0.2500000
tr_TUR_cumhur 2020-01-01 Q46 Q15 1 0.2500000
tr_TUR_yenisa 2021-01-01 Q46 Q15 1 0.2500000
ar_TUN_babnet 2021-01-01 Q46 Q15 1 0.2500000
fr_TUN_ecomag 2019-01-01 Q46 Q15 1 0.2500000
ar_DZA_elkahb 2021-01-01 Q46 Q15 1 0.2500000

Experience 2 : Top 20 regions in full corpus

We can propose firstly a table of top entities in the whole corpus of newspapers.

id de en fr tr nb
1 Q458 Europäische Union European Union Union européenne Avrupa Birligi 5148
2 Q46 Europa Europe Europe Avrupa 4735
3 Q15 Afrika Africa Afrique Afrika 2349
4 Q4918 Mittelmeer Mediterranean Sea mer Méditerranée Akdeniz 917
5 Q7184 NATO NATO Organisation du traité de l’Atlantique Nord NATO 438
6 Q166 Schwarzes Meer Black Sea mer Noire Karadeniz 402
7 Q7204 Mittlerer Osten Middle East Moyen-Orient Orta Dogu 359
8 Q48 Asien Asia Asie Asya 327
9 Q66065 Sahelzone Sahel Sahel Sahel 286
10 Q1286 Alpen Alps Alpes Alpler 170
11 Q25322 Arktis Arctic Arctique Arktika 168
12 Q97 Atlantischer Ozean Atlantic Ocean océan Atlantique Atlas Okyanusu 158
13 Q98 Pazifischer Ozean Pacific Ocean océan Pacifique Büyük Okyanus 157
14 Q28227 Maghreb Maghreb Maghreb Magrip 152
15 Q6583 Sahara Sahara Sahara Sahra 146
16 Q12585 Lateinamerika Latin America Amérique latine Latin Amerika 122
17 Q51 Antarktika Antarctica Antarctique Antarktika 106
18 Q2841453 Amazonien Amazonia Amazonie NA 104
19 Q48214 Naher Osten Near East Proche-Orient Yakin Dogu 88
20 Q664609 Karibik Caribbean Caraïbes Karayipler 88

Experience 2 : Turkish newspapers - Top 10 regions

tab1 Cumhuryet_Region Cumhuryet pct Yeni Savak_Region Yeni Savak pct
1 Avrupa 49.3 Avrupa 37.2
2 Akdeniz 13.1 Akdeniz 17.6
3 NATO 10.4 Karadeniz 12.8
4 Karadeniz 8.7 NATO 12.6
5 Afrika 3.3 Afrika 5.0
6 Avrupa Birligi 2.3 Avrupa Birligi 2.5
7 Asya 2.3 Asya 2.2
8 Avrasya 1.9 Avrasya 1.8
9 Antarktika 1.2 Orta Dogu 1.1
10 Avrupa Konseyi 1.0 Antarktika 1.0

Experience 2 : German newspapers - Top 10 regions

tab1 FAZ_Region FAZ pct Süd. Zeit._Region Süd. Zeit. pct
1.0 Europa 36.7 Europäische Union 39.7
2.0 Europäische Union 34.1 Europa 29.0
3.0 Afrika 4.4 Afrika 5.0
4.0 Mittelmeer 3.2 Mittlerer Osten 4.8
5.0 Asien 2.6 Mittelmeer 4.4
6.0 Alpen 2.2 Alpen 2.9
7.0 Mittlerer Osten 1.6 Naher Osten 1.5
8.0 Osteuropa 1.3 Südamerika 1.3
9.0 Balkanhalbinsel 1.1 Lateinamerika 1.0
10.5 Südostasien 0.9 Arktis 1.0

Experience 2 : French newspapers - Top 10 regions

tab1 Figaro_Region Figaro pct Le Monde_Region Le Monde pct
1 Union européenne 29.8 Afrique 26.4
2 Europe 25.5 Europe 25.8
3 Afrique 9.4 Sahel 8.5
4 mer Méditerranée 4.9 mer Méditerranée 7.3
5 Sahel 3.0 Proche-Orient 3.6
6 Amazonie 2.9 Moyen-Orient 3.3
7 Alpes 2.3 Amazonie 2.2
8 Polynésie 2.1 Polynésie 2.2
9 Moyen-Orient 1.8 Alpes 2.1
10 Conseil de l’Europe 1.4 Sahara 1.8

Experience 2 : UK newspapers - Top 10 regions

tab1 Guardian_Region Guardian pct Daily Telegraph_Region Daily Telegraph pct
1 European Union 38.5 European Union 44.9
2 Europe 21.2 Europe 22.9
3 Africa 12.1 Africa 17.8
4 Arctic 4.0 Asia 2.2
5 Middle East 3.9 Middle East 1.3
6 Pacific Ocean 3.5 Arctic 1.1
7 Atlantic Ocean 2.4 Pacific Ocean 1.1
8 Asia 1.8 Commonwealth of Nations 0.9
9 Latin America 1.7 South China Sea 0.8
10 Antarctica 1.5 Caribbean 0.8

Experience 2 : Irish newspapers - Top 10 regions

tab1 Irish Times_Region Irish Times pct Belfast Telegraph_Region Belfast Telegraph pct
1 European Union 62.3 European Union 54.2
2 Europe 17.9 Africa 16.2
3 Africa 8.2 Europe 15.6
4 Atlantic Ocean 1.3 Atlantic Ocean 2.2
5 Asia 1.2 Arctic 1.8
6 Pacific Ocean 1.2 Commonwealth of Nations 1.8
7 Middle East 1.2 Middle East 1.4
8 Maghreb 0.6 Asia 1.4
9 Alps 0.6 Caribbean 1.0
10 Latin America 0.5 Pacific Ocean 0.8

Experience 2 : Tunisian newspapers (top5)

tab1 Babnet (ar)_Region Babnet (ar) pct Econ. Mag_Region Econ. Mag pct La Presse_Region La Presse pct Réalités_Region Réalités pct
1.0 Afrique 32.8 Afrique 30.9 Afrique 31.4 Afrique 39.3
2.0 Union européenne 24.0 Maghreb 13.3 mer Méditerranée 29.7 Europe 13.5
3.0 Europe 15.9 mer Méditerranée 12.7 Sahel 15.1 Sahel 9.2
4.5 Maghreb 4.8 Europe 11.3 Maghreb 5.2 Maghreb 8.6
4.5 Sahel 4.8 Union européenne 11.0 Europe 4.7 mer Méditerranée 8.0

Algerian newspapers

tab1 Al Nahar (ar) pct1 El Kahbar (ar) pct2 El Watan (fr) pct3
1 Afrique 40.1 Afrique 55.9 Sahara 36.8
2 Europe 27.5 Europe 19.5 Afrique 17.3
3 Union européenne 11.6 Union européenne 7.6 Sahel 13.6
4 Sahel 4.3 mer Méditerranée 3.6 Maghreb 10.5
5 Asie 3.9 Asie 3.2 Europe 4.5
6 Ligue arabe 2.5 Moyen-Orient 2.6 Proche-Orient 3.6
7 Moyen-Orient 2.4 Maghreb 1.8 mer Méditerranée 2.7
8 mer Méditerranée 2.0 Sahel 1.8 Moyen-Orient 2.7
10 Maghreb 1.2 Ligue arabe 1.2 Afrique du Nord 2.3
10 Organisation du traité de l’Atlantique Nord 1.2 Organisation du traité de l’Atlantique Nord 0.8 Ligue arabe 1.8

Experience 2 : Correspondance analysis - Factor 1-2

N.B. We have eliminated the units “Americas,” “Europe” and “European Union”

## Joining, by = "id"

Experience 2 : Factors 3-4

Experience 2 : Cluster analysis(world regions)

Experience 2 : Cluster analysis (medias)

Bibliography

Brennetot, Arnaud, and Muriel Rosemberg. 2013. “Géographie de lEurope et géographie de la construction européenne.” LEspace Politique, no. 19 (April). https://doi.org/10.4000/espacepolitique.2613.
Cholley, André. 1939. “Régions naturelles et régions humaines.” L’information géographique 4 (2): 40–42. https://doi.org/10.3406/ingeo.1939.5013.
Copeaux, Étienne. 1997. “Chapitre III. Les Instances de Production Du Discours Historique Scolaire.” In, 103–16. CNRS Éditions. https://doi.org/10.4000/books.editionscnrs.35363.
Grataloup, Christian. 2011. “La fausse neutralité des continents.” Revue internationale et stratégique 82 (2): 97. https://doi.org/10.3917/ris.082.0097.
Lewis, Martin W., and Kären Wigen. 2019. The Myth of Continents. University of California Press. https://doi.org/10.1525/9780520918597.